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ABSTRACT 

A significant role of the testing specialist can be 
to assist teachers in becoaiag better testaakers and users. The first 
step in iaprovlng teachers 1 assessment instruaents and techniques is 
to try to get thea to becoae articulate about their objectives and to 
state thea in concrete behavioral teras. Then the teacher needs to 
examine his own test exercises to see it they encoapass a realistic 
range of transfer of learning atd reflect the educational goals of 
the course or progiaa. The specialist must help the teacher find a 
aiddle ground where this transferability is tested at several points 
over a range of generalization and application within the broadly 
defined boundaries of the subject area. Finally, the specialist can 
give suggestions on itea writing and editing, in the area of test 
use, the problea is to bring both the skeptics and the unqualified 
acceptors into a unity of tempered and qualified accept^nco. Perhaps 
the aost iaportant service that could be performed is to get 
test user to take a good hard look at the test, the test 
the test noras. The specialist should try 

of watchful skepticisa toward all assessments 
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ABOUT THIS REPORT 

Educational measurement lies at the heart of 
some of the most important aspects of the educa 
tional process. M represents a primary means for 
revealing potential, organizing objectives, stimu 
lating effort, recognizing accomplishment, and 
improving practice. 

There is 3 clear need to support in evtry way 
po'sible informed and responsible practice on the 
part of aM professionals whose interests and po^i 
tions give them occasion to use educational measure- 
ment. * 

Consequently, the Board of Directors of NCME 
have initiated this series of brief reports cal’ed 
Measurement in Education. The purpose is to assist 
in the dissemination of useful reports on measure 
rr<int techniques and implications in teaching, 
guidance, and administration. 

It is particularly appropriate that Robert Thorn 
dike is the first author in this scries. His name has 
iong been associated with application of measure 
ment. Professor Thorndike's distinguished career at 
Teachers College is studded with such highly re 
garded publ ications as Personae* Selection, Measure- 
ment and Evaluation in Psychology ar i Education, 
and io/ge- Thorndike Intelligence Tests 

As b member of the National Academy of Educa 
tion and as president of several national organiza 
tions, he has been a leader of the field for many 
years. This article is based upon a paper presented A 
the National Testing Conference sponsored by the 
New York State Education Department. 
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ROBERT L THORNDIKE 

How can the testing specialist best help'icachers. 
working on the educational firing line, to become 
more effective in using tests and measurements? He 
can make his contribution by helping them in the 
two main ways in which they use tests and testing, 

On the one hand, they make their own tests to 
assess learning by their own pupils of tne skills, 
knowledge, and understanding that they (the 
teachers) have been trying to impart. Teachers need 
help to do this more skillfully. On the othe>, they 
try from time to time to uS A information provided 
by standardized testing programs in order to under 
stand better the pupils under their care, and to 
adapt their teaching to those pupils. Teachers need 
help to do this more wisely. We need to help them 
to be better testmakers and better test users. 



HELPING TEST MAKERS 



As the measurement and evaluation specialist 
meets with teacher groups to work toward improve 
ment of their own assessment instruments and 
techniques, his first step is usually to try to get 
those teachers to become articulate about their 
objectives and to stale them in concrete behavioral 
terms. This is vc,y appropriate, for most teachers 
(measurement specialists included) tend to take 
their goals more or fess for granted. The goals 
remain implicit in the consent of the subject matter, 
the organization of the syllabus, or the sequence of 
topics in the text. Paraphrasing Sir Edmund Hillary, 
wt sometimes teach what wa teach simply because, 
like Mt. Everest, "it is there." 

When teachers become articulate about qoolsand 
purposes of their teaching, theta goals take on much 
more elegant uno imposing form than just "follow 
ing the syllabus" or "cohering the textbook.” 
Tnough aieas of subject matter receive some em 
phasis, a major part of the formulation is likely to 
be devoted to such princess ouveomes as understard 
ing of principles and generalizations, ability to apply 
learnings in lie situations, improved skills of 
scientific thinking and problem solving, and perhap* 
changed attitudes end values. 




Specifying Goals 

If left to their own devices, teachers are likefy to 
verbalize their goals in terms of broad generalities of 
the type that I have fuct given. One of the tasks of 
the measurement specialist is to keep nudging them 
over to more specific and behavioral outcomes — 
outcomes that are sufficiently delimited so that we 
can agree as to what behaviors can be accepted as 
representing them. Such specificity is important to 
guide testmaking, but it is also important in giving 
focus and direction to the teachers' teaching. 

But do teachers really mean it when they set 
forth objectives of understanding, applying, general 
u ing, inferring, and problem solving? Or do they 
merely put together an impressive set of phrares, 
without really understanding or accepting what they 
have committed themselves to, in order to pacify 
the administration or the evaluation specialist who 
has pressured them into making some explicit state- 
ment of goals? 

One way of appraising the reality of such ambi 
tious statements of objectives is to take a hard took 
at the teacher's evaluation procedures. Do the test 
exercises require the student to exhibit understand 
ing, or can he deal with them by reproducing 
essentially unchanged what he has been taught in 
class or what is presented in the book? 

AM too often, the latter appears to be the case. He 
is called upon to identify the definition given in the 
words of the book r list the reasons as they were 
given by the teacher, or solve the stock problem that 
differs from the ones on which he has practiced only 
in the values of the numbers that are involved. 

Why does this happen? Doesn't the teacher know 
what "understanding" and "application" means so 
far as testmaking is concerned? Is it too much of an 
effort to make up good test exercises to test these 
higher level abilities? Or does the teacher not realty 
ccnsider them a "fair" assessment of his teaching or 
of his pupils? 



Measuring Understanding 



Perhaps all three are in some measure responsible, 
and if this is so, it is to all three that the measure 
ment specialist must address himself. He must help 
teachers to become aware of the characteristics that 
are required in a test exercise ir it is to measure 
higher levels of intellectual process, he must fire up 
the spa k of interest in, and enthusiasm for, a more 
sophisbeated job of assessment; but in addition, he 
must convince teachers that such assessments are 
really appropriate and not an unfair venture into 
material that is not "in the book"-a sneaky under- 
hand trick 

The crucial indicator of a student's understanding 
of a concept, a principle, or a procedure is that he is 
able tc apply it in circumstances that are different 
from those under which it was taught. Transfer 
ability is the key feature of meaningful learning So if 
to test for understanding, we must test in 
ances that are at least in part new. 

' n .. , ..... . ... 




Does a child really know how to read a map? Try 
him with one that is different from the one in thr 
book. Does he really understand denominate num- 
bers? Give him some problems phrased in "wugs," 
"pogs," and "pilzits," the units used in measure 
ment in the country of "Zoolumbia." (I hope that a 
real "Zoolumbia" hasn't sprung into existence re 
cently without my being aware of it. I Does the Bill 
of Rights mean anything to him except a lot of 
words to be memorized? Ask him in what way 
recently proposed laws to regulate the sale of fire 
arms might be considered unconstitutional. 

* 
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Transfer is of all degrees of remoteness. Few 
teachers would quarrel whh the idea that a pupil 
should be able to read and interpret maps differing 
from the one in the text, though not all would take 
the trouble to provide a new and different one. 
More might be uncomfortable if the map dealt with 
a fictitious courtry and still more if the legend on 
the map introduced a whole set of new and different 
symbols for features of terrain or culture. But each 
of these variations represents a generalization of the 
basic decoding operation, understanding of which 
provides the foundation of any map reading. 

The specialist, working with teachers, needs to 
help them to appreciate the universality of the 
ability to transfer learnings as a goal of education 
and to define for themselves the range of transfer 
ability in which they are really and realistically 
interested as an outcome of instruction in their 
course or program. 

Too limited and meager transfer objectives will 
make their courses sterile and their evaluation 
barren. From ihe evaluation angle, we see this in 
those tests that are mada of such items as "When did 
. . . "Who d'd . . ?", "Define . . "List . . .", 

"Make a diagram of . , . labeling all the parts." 

Too comprehensive and remote transfer goals will 
be unrealistic end will rail for evaluations that seem 
to tack any meaningful relationship to what has 
been taught end to be irrelevant and unfair. These 
are likely to go completely outside the subject 
matter of instruction. For example, one might test 
the student of Latin on his maste.y of English 
vocabulary or grammar an outcome that used lobe 
stated as one of the objectives of Latin. Or one 
might test the geometry student on his ability to 
identify faully reasoning in politice 1 arguments, that 
is, a generalized improvement in logical reasoning 

Evaluations such as these will be rejected by 
pupils and teachers alike. The specialist must help 
the teacher or group of teachers to find a middle 
ground where transfer of learnings is tested at 
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several points over a range of generalization and 
application from that which represents a minimal 
change from the specifics of what was taught to that 
which pushes the realistic limits of able students 
within tne broadly defined boundaries of the subject 
area. i 



Writing Better Items 

These are, of course, the editorial tricks of the 
trade-the "do's" and "don'ts" of test iten writing. 
These represent the accumulated "know how" of 
academic generations of testmakurs and h^ve about 
the same standing as the formulas or g*.od writing 
That appear in a feshman composition manual. 
Some ere matters of convention, some of good taste, 
and some are distillations of wisdom on clear and 
effective communication. There will always be room 
for improved communication skills, whether these 
be the skills of writing a simple exposition or the 
skills of formulating a precisely stated test task 

Within (he limits of their time and personal 
resources, measurement specialists who have mas- 
tered the "grammar" and stylistics of item writing 
can serve a useful function by communicating this 
knowledge and skill as widely as possible among 
those who day by day and week by week perpetrate 
the ambiguities and irrelevant complexities that get 
inflicted on hapless and helpless pupils. No need to 
go into further specifics here. Fuzzy, unciear, un- 
necessarily complex writing is bad writing whenever 
vve encounler it, and we should combat it with the 
best strategy that we can bring to bear. 

Two components of strategy that seem important 
are cooperative test preparation and continuing test 
analysis. There is no antidote to ambiguity quite so 
powerful as review by an independent reader, and 
no tonic quite so effective, over the long haul, as a 
routine practice of analyzing the responses to test 
exercises and accumulating the results of thisanaly 
sis as a V jsis for subsequent item selection Coopera 
lion in the preparation and use of lest mater pats can 
flourish only in a school rotting where there is a 
climate of cooperative functioning, but the test 
specialist can try at each point to direct this cooper- 
ation to the testmaking function. The analysis of 
test results, and assembly of item files, is a corollary 
of this cooperation. It becomes increasingly prac 
heal as scoring machines and/or computers become 
more widely available in and near school systems. 

But item wriling and item edih'ng deal with 
matters o' form. The substance is what is tested, and 
this is where the important possibilities for change 
fie. 



HELPING TEST USERS 



In the matter of the teacher's use of standardized 
tests, il is hard to know where to start. Perhaps this 
is because teachers seem to deviate from one's ideal 
in test interpretation in two diametrically opposed 
directions, and one must deal with both extremes. 
q me hand, there is the group of teachers who 
v oh standardized testing as a meaningless 
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enterprise, not only inaccurate but also irrelevant to 
the genuine learning tasks in the school. At the 
other end there is the (I suspect larger) group who 
consider a grade-equivalent on a standardizeo :est to 
be the infallible revelation of divine truth. Some- 
how, we must bring the two tails of the distribution 
of response to standardized testing- tails that some 
# imes look more like the twin humps of a drome- 
dary-back together into a unity of tempered ri nd 
qualified acceptance 

A Good Hard Look 

1 some* mw-s think du: must important service we 
could achieve is somehow to get every test-user or 
interpreter lo take a good hard look at the test 
whose score he is proposing to use or interpret. A 
good hard look means a look inside the test book at 
the tasks and items, not just at the title on the 
cover. A diagnostic test of poetry reading looks less 
exciting when scrutiny shows it to be a highly 
analytic lest of the meaning of words and phrases in 
a single poem. A good hard look means a look at the 
manual and the test norms. A difference of half a 
grade in the yraoe norms for a test somehow shrinks 
back into proportion when il is seen as just two 
more items answered correctly and when the stand- 
a r d error of measurement is seen to be three raw- 
score points. 

At the same time that we educate teachers to 
look at the test whose results they are preparing to 
interpret- not just at its name or what the authors 
say about it-perhaps we can persuade them to 
examine test results in relation to other facts that 
are available it r their class or for a pupil in it. It 
happens occasionally, but it does happen in real life, 
that test results are patently absurd. Scorers have 
been known to use the wrong scoring key; the mosl 
elaborate automated test-scoring systems may oc- 
casionally be fed *n incorrect pupil identification. 
The cold eye of common sense will identify enough 
instances when a test result should be verified, if 
that is possible, or else disregarded, to make the 
habit of critical examination of test results a 
thoroughly worthy one. 

But this same spirit of skepticism needs to extend 
equally to the teacher's personal appraisal of pupils. 
The teacher who is most critical of standardized 
testing is often endowed with unlimited faith In the 
accuracy of his own judgments. He knows* It is 
vitally important that we do not, in identifying the 
shortcomings of test data, manage at the same time 
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to build up the teacher's view that his own judgment 
is infallible. As we develop caution in accepting and 
interpreting test results, we should try to generalize 
this to an attitude of watchful skepticism toward all 
assessments of pupils from whatever source. If we 
can show teachers how to take this hard, analytical 
look, and can motivate them to do so, we will have 
made a good start toward overcoming the serious 
misuses and misinterpretations to which not only 
standardized test results but all pupil appraisals are 
now subject. 

One final problem of strategy before I dose: How 
to mobilize limited evalualional expertise so as to 
make the greatest and most lasting impact on the 
educational scene? We know that many, perhaps 
most, of the teachers now in a school system will 
not be there 3 or 4 years from now. The turnover of 
teacher personnel is distressingly high. We also knew 
that forgetting of what has been briefly presented 
and incompletely learned is discouragingly rapid. 
How can the specialist achieve an impact that will 
last beyond the immediate situation and the present 
crop of teachers? 

Sustained Contact 

I don't know, but I suspect that he will not 
achieve it by sporadic visits to a school system, or 
by occasional and somewhat casual contacts with 
the school staff. I suspect that it is most likely to 
come about if o..j or more persons in the school, 
already of sufficiently long tenure so that their 
continuation in education and at the school is 
probable, are brought together with their counter- 
parts from other school systems for a workshop of 
sufficient intensity and duration that they become 
permanently infected with the evaluation virus, and 
will return to be a focus of infection within their 
own school system. It is through this channel, I 
suspect, that the influence of the measurement 
specialist can be most effectively spread throughout 
the length and breadth of the land. 
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